Stochastic binary hidden units in a multi-layer perceptron (MLP) network giveat least three potential benefits when compared to deterministic MLP networks.(1) They allow to learn one-to-many type of mappings. (2) They can be used instructured prediction problems, where modeling the internal structure of theoutput is important. (3) Stochasticity has been shown to be an excellentregularizer, which makes generalization performance potentially better ingeneral. However, training stochastic networks is considerably more difficult.We study training using M samples of hidden activations per input. We show thatthe case M=1 leads to a fundamentally different behavior where the networktries to avoid stochasticity. We propose two new estimators for the traininggradient and propose benchmark tests for comparing training algorithms. Ourexperiments confirm that training stochastic networks is difficult and showthat the proposed two estimators perform favorably among all the five knownestimators.
展开▼